Stability of Software Defect Prediction in Relation to Levels of Data Imbalance

نویسندگان

  • Tihana Galinac Grbac
  • Goran Mausa
  • Bojana Dalbelo Basic
چکیده

Software defect prediction is recognized as one of the most important ways to reach software development efficiency. The majority of costs during software development is spent on software defect detection activities, but their ability to guarantee software reliability is still limited. The analyses performed by [Andersson and Runeson 2007; Fenton and Ohlsson 2000; Galinac Grbac et al. 2013], in the environment of a large scale industrial software with high focus on reliability shows that faults are distributed within the system according to the Pareto principle. They prove that the majority of faults are concentrated in just small amount of system modules, and that these modules do not compose a majority of system size. This fact implies that software defect prediction would really bring benefits if a well performing model is applied. The main motivating idea is that if we were able to predict the location of software faults within the system, then we could plan defect detection activities more efficiently. This means that we would be able to concentrate defect detection activities and resources into critical locations within the system and not on the entire system. Numerous studies have already been performed aiming to find the best general software defect prediction model [Hall et al. 2012]. Unfortunately, a well performing solution is still absent. Data in software defect prediction are very complex, and do not follow in general any particular probability distribution that could provide a mathematical model. Data distributions are highly skewed, which is connected to the popular data imbalance problem, thus making standard machine learning approaches inadequate. Therefore, a significant research has recently been devoted to cope with this problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه یک روش فازی-تکاملی برای تشخیص خطاهای نرم‌افزار

Software defects detection is one of the most important challenges of software development and it is the most prohibitive process in software development. The early detection of fault-prone modules helps software project managers to allocate the limited cost, time, and effort of developers for testing the defect-prone modules more intensively.  In this paper, according to the importance of soft...

متن کامل

Using Class Imbalance Learning for Cross-Company Defect Prediction

Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, the performance of such CCDP models is susceptible to the high imbalanced nature between the defect-prone and non-defect classes of CC data. Class imbalance learning is applied to alleviat...

متن کامل

Combining Particle Swarm Optimization based Feature Selection and Bagging Technique for Software Defect Prediction

The costs of finding and correcting software defects have been the most expensive activity in software development. The accurate prediction of defect‐prone software modules can help the software testing effort, reduce costs, and improve the software testing process by focusing on fault-prone module. Recently, static code attributes are used as defect predictors in software defect prediction res...

متن کامل

Heterogeneous Defect Prediction via Exploiting Correlation Subspace

Software defect prediction generally builds models from intra-project data. Lack of training data at the early stage of software testing limits the efficiency of prediction in practice. Thereby researchers proposed cross-project defect prediction using the data from other projects. Most previous efforts assumed the cross-project defect data have the same metrics set which means the metrics used...

متن کامل

On Software Defect Prediction Using Machine Learning

The goal of this paper is to catalog the software defect prediction using machine learning. Over the last few years, the eld of software defect prediction has been extensively studied because of it's crucial position in the area of software reliability maintenance, software cost estimation and software quality assurance. An insurmountable problem associated with software defect prediction is th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013